Quantitative Analysis of Genealogy Using Digitised Family Trees
نویسندگان
چکیده
Driven by the popularity of television shows such as Who Do You Think You Are? many millions of users have uploaded their family tree to web projects such as WikiTree [1]. Analysis of this corpus enables us to investigate genealogy computationally. The study of heritage in the social sciences has led to an increased understanding of ancestry and descent [2] but such efforts are hampered by difficult to access data [3]. Genealogical research is typically a tedious process involving trawling through sources such as birth and death certificates, wills, letters and land deeds [4]. Decades of research have developed and examined hypotheses on population sex ratios, marriage trends, fertility, lifespan, and the frequency of twins and triplets. These can now be tested on vast datasets containing many billions of entries using machine learning tools. Here we survey the use of genealogy data mining using family trees dating back centuries and featuring profiles on nearly 7 million individuals based in over 160 countries. These data are not typically created by trained genealogists and so we verify them with reference to third party censuses. We present results on a range of aspects of population dynamics. Our approach extends the boundaries of genealogy inquiry to precise measurement of underlying human phenomena.
منابع مشابه
Quantitative analysis of population-scale family trees with millions of relatives.
Family trees have vast applications in multiple fields from genetics to anthropology and economics. However, the collection of extended family trees is tedious and usually relies on resources with limited geographical scope and complex data usage restrictions. Here, we collected 86 million profiles from publicly-available online data shared by genealogy enthusiasts. After extensive cleaning and...
متن کاملQuantitative Comparison of Tree Pairs Resulted from Gene and Protein Phylogenetic Trees for Sulfite Reductase Flavoprotein Alpha-Component and 5S rRNA and Taxonomic Trees in Selected Bacterial Species
Introduction: FAD is the cofactor of FAD-FR protein family. Sulfite reductase flavoprotein alpha-component is one of the main enzymes of this family. Based on applications of this enzyme in biotechnology and industry, it was chosen as the subject of evolutionary studies in 19 specific species. Method: Gene and protein sequences of sulfite reductase flavoprotein alpha-component, 5S rRNA sequence...
متن کاملQuantitative Comparison of Tree Pairs Resulted from Gene and Protein Phylogenetic Trees for Sulfite Reductase Flavoprotein Alpha-Component and 5S rRNA and Taxonomic Trees in Selected Bacterial Species
Introduction: FAD is the cofactor of FAD-FR protein family. Sulfite reductase flavoprotein alpha-component is one of the main enzymes of this family. Based on applications of this enzyme in biotechnology and industry, it was chosen as the subject of evolutionary studies in 19 specific species. Method: Gene and protein sequences of sulfite reductase flavoprotein alpha-component, 5S rRNA sequence...
متن کاملCOSC 460 Improving Face
Face recognition has long been an area of great interest within computer science, and as face recognition implementations become more sophisticated, the scope of real-world applications has widened. The field of genealogy has embraced the move towards digitisation, with increasingly large quantities of historical photographs being digitised in an effort to both preserve and share them with a wi...
متن کاملELIJAH, Extracting Genealogy from the Web
On-line genealogy is becoming America’s latest craze [1]. The LDS church’s FamilySearch website contains only a fraction of the information that is available on the Web. People around the world, both members and non-members, have posted family trees and other genealogical information on tens of thousands of websites throughout the Web. Although the genealogical data posted by individuals repres...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- CoRR
دوره abs/1408.5571 شماره
صفحات -
تاریخ انتشار 2014